Semi-Supervised Active Learning in Graphical Domains
نویسندگان
چکیده
In a traditional machine learning task, the goal is training a classifier using only labeled data (data feature/label pairs) in order to be able to generalize on completely new data to be labeled by the classifier. Unluckily in many cases it is difficult, expensive or time consuming to obtain the labeled instances needed for training, also because we usually require a human supervisor to annotate lots of data to collect a significant training set. Moreover, in many cases we are not interested in generalization to any unseen example, but we just require to discover labels for a large quantity of unlabeled, but already available, data by using a small subset of labeled data. If the given scenario involves both these conditions, a semi-supervised learning algorithm can be exploited as a solution for the classification problem. Semi-supervised learning algorithms combine a large amount of unlabeled data and a available small set of labeled data, to build a reliable classifier. It is particularly interesting to focus on a sub-class of semi-supervised learning algorithms, that is graph-based semi-supervised learning. In this framework we represent data as a graph where the nodes represent the labeled and unlabeled examples in the dataset, and the edges are added according to a given similarity relationship between pairs of examples. A common feature of every graph-based method is the fact they are nonparametric, discriminative and transductive. However, a crucial issue is the very limited number of labeled (supervised) data points we have with respect to unlabeled points, so it is essential to have representative examples. In some cases the labeled data points are given, but there are also many scenarios where we only have a set of unlabeled data points and we can choose a limited number of them to built the labeled data set. In this paper we propose a graph-based semi-supervised active learning algorithm based on a reasonable choice for labeled data points, in oder to improve classification accuracy. In a typical semi-supervised learning problem we should consider a set of n data points X = {x 1 , x 2 , · · · , x n } defined on a d dimensional feature space, such that each point x i ∈ IR d , and a set of labels Y = {+1, −1}. We have an oracle function h which gives us the correct labelling for every data point we applied it to. So …
منابع مشابه
Few-Shot Learning with Graph Neural Networks
We propose to study the problem of few-shot learning with the prism of inference on a partially observed graphical model, constructed from a collection of input images whose label can be either observed or not. By assimilating generic message-passing inference algorithms with their neural-network counterparts, we define a graph neural network architecture that generalizes several of the recentl...
متن کاملInducing Interpretable Representations with Variational Autoencoders
We develop a framework for incorporating structured graphical models in the encoders of variational autoencoders (VAEs) that allows us to induce interpretable representations through approximate variational inference. This allows us to both perform reasoning (e.g. classification) under the structural constraints of a given graphical model, and use deep generative models to deal with messy, high...
متن کاملIclr 2018 F Ew - S Hot L Earning with G Raph N Eural N Et - Works
We propose to study the problem of few-shot learning with the prism of inference on a partially observed graphical model, constructed from a collection of input images whose label can be either observed or not. By assimilating generic message-passing inference algorithms with their neural-network counterparts, we define a graph neural network architecture that generalizes several of the recentl...
متن کاملSemi-supervised and Active Training of Conditional Random Fields for Activity Recognition
Automated human activity recognition has attracted increasing attention in the past decade. However, the application of machine learning and probabilistic methods for activity recognition problems has been studied only in the past couple of years. For the first time, this thesis explores the application of semi-supervised and active learning in activity recognition. We present a new and efficie...
متن کاملA Hierarchical Graphical Model for Record Linkage
The task of matching co-referent records is known among other names as record linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonably clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for...
متن کامل